Point Pattern Analysis

HES 505 Fall 2022: Session 16

Matt Williamson

Objectives

  • Define a point process and their utility for ecological applications

  • Define first and second-order Complete Spatial Randomness

  • Use several common functions to explore point patterns

  • Leverage point patterns to interpolate missing data

What is a point pattern?

  • Point pattern: A set of events within a study region (i.e., a window) generated by a random process

  • Set: A collection of mathematical events

  • Events: The existence of a point object of the type we are interested in at a particular location in the study region

  • A marked point pattern refers to a point pattern where the events have additional descriptors

Some notation:

  • \(S\): refers to the entire set

  • \(\mathbf{s_i}\) denotes the vector of data describing point \(s_i\) in set \(S\)

  • \(\#(S \in A )\) refers to the number of points in \(S\) within study area \(A\)

Requirements for a set to be considered a point pattern

  • The pattern must be mapped on a plane to preserve distance

  • The study area, \(A\), should be objectively determined

  • There should be a \(1:1\) correspondence between objects in \(A\) and events in the pattern

  • Events must be proper i.e., refer to actual locations of the event

  • For some analyses the pattern should be a census of the relevant events

Analyzing Point Patterns

  • Modeling random processes means we are interested in probability densities of the points (first-order)

  • Also interested in how the presence of some events affects the probability of other events (second-order)

  • Finally interested in how the attributes of an event affect location (marked)

Analyzing Point Patterns

Kernel Density Estimates (KDE)

\[ \begin{equation} \hat{f}(x) = \frac{1}{nh_xh_y} \sum_{i=1}^n k\bigg(\frac{{x-x_i}}{h_x},\frac{{y-y_i}}{h_y} \bigg) \end{equation} \]

  • Assume each location in \(\mathbf{s_i}\) drawn from unknown distribution

  • Distribution has probability density \(f(\mathbf{x})\)

  • Estimate \(f(\mathbf{x})\) by averaging probability “bumps” around each location

  • Need different object types for most operations in R (as.ppp)

Kernel Density Estimates (KDE)

  • \(h\) is the bandwidth and \(k\) is the kernel

  • We can use stats::density to explore

x <- rpoispp(lambda =50)
K1 <- density(x, bw=2)
K2 <- density(x, bw=5)
K3 <- density(x, bw=2, kernel="disc")

Choosing bandwidths and kernels

  • Small values for \(h\) give ‘spiky’ densities

  • Large values for \(h\) smooth much more

  • Some kernels have optimal bandwidth detection

  • tmap package (later) provides additional functionality

Second-Order Analysis

Second-Order Analysis

  • KDEs assume independence of points (first order randomness)

  • Second-order methods allow dependence amongst points (second-order randomness)

  • Several functions for assessing second order dependence (\(K\), \(L\), and \(G\))

Ripley’s \(K\) Function

  • If points have independent, fixed marginal densities, then they exhibit complete, spatial randomness (CSR)

\[ \begin{equation} K(d) = \lambda^{-1}E(N_d) \end{equation} \]

  • We can test for clustering by comparing to the expectation:

\[ \begin{equation} K_{CSR}(d) = \pi d^2 \end{equation} \]

  • if \(k(d) > K_{CSR}(d)\) then there is clustering at the scale defined by \(d\)

Ripley’s \(K\) Function

  • When working with a sample the distribution of \(K\) is unknown

  • Estimate with

\[ \begin{equation} \hat{K}(d) = \hat{\lambda}^{-1}\sum_{i=1}^n\sum_{j=1}^n\frac{I(d_{ij} <d)}{n(n-1)} \end{equation} \]

where:

\[ \begin{equation} \hat{\lambda} = \frac{n}{|A|} \end{equation} \]

Ripley’s \(K\) Function

  • Using the spatstat package

kf <- Kest(bramblecanes, correction-"border")
plot(kf)

Ripley’s \(K\) Function

  • accounting for variation in \(d\)
kf.env <- envelope(bramblecanes, correction="border")
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,  99.

Done.
plot(kf.env)

Other functions

  • \(L\) function: square root transformation of \(K\)

  • \(G\) function: cumulative distribution of nearest neighbor distances

Interpolation

Interpolation

  • Goal: estimate the value of \(z\) at new points in \(\mathbf{x_i}\)

  • Most useful for continuous values

  • Nearest-neighbor, Inverse Distance Weighting, Kriging

Nearest neighbor

  • find \(i\) such that \(| \mathbf{x_i} - \mathbf{x}|\) is minimized
  • The estimate of \(z\) is \(z_i\)

Inverse-Distance Weighting

  • Weight closer observations more heavily

\[ \begin{equation} \hat{z}(\mathbf(x)) = \frac{\sum_{i=1}w_iz_i}{\sum_{i=1}w_i} \end{equation} \]